Registration of Functional Data Using Fisher-Rao 

Metric 



t 



A. Srivastava^ W. Wu^ S. Kurtek^ E. Klassen*, and J. S. Marron* 
Dept. of Statistics & ^ Dept. of Mathematics, Florida State University, 
* Dept. of Statistics, University of North CaroUna 



Abstract 

We introduce a novel geometric framework for separating the phase and the amplitude 
variability in functional data of the type frequently studied in growth curve analysis. This 
framework uses the Fisher-Rao Riemannian metric to derive a proper distance on the quotient 
space of functions modulo the time-waiping group. A convenient squai^e-root velocity function 
(SRVF) representation transfomis the Fisher-Rao metric into the standai^d metric, simpli- 
fying the computations. This distance is then used to define a Karcher mean template and 
warp the individual functions to align them with the Karcher mean template. The strength of 
this framework is demonstrated by deriving a consistent estimator of a signal observed under 
random waiping, scaling, and vertical translation. These ideas are demonstrated using both 
simulated and real data from different application domains: the Berkeley growth study, hand- 
written signature curves, neuroscience spike trains, and gene expression signals. The proposed 
method is empirically shown to be be superior in performance to several recently published 
methods for functional alignment. 

1 Introduction 

The problem of statistical analysis in function spaces is important in a wide variety of applications 
arising in nearly every branch of science, ranging from speech processing to geology, biology and 
chemistry. One can easily encounter a problem where the observations are real-valued functions 
on an interval, and the goal is to perform their statistical analysis. By statistical analysis we mean 
to compare, align, average, and model a collection of such random observations. These problems 
can, in principle, be addressed using tools from functional analysis, e.g. using the Hilbert 
structure of the function spaces, where one can compute distances, cross-sectional (i.e. point- 
wise) means and variances, and principal components of the observed functions [16]. However, a 
serious challenge arises when functions are observed with flexibility or domain warping along the 
X axis. This warping may come either from an uncertainty in the measurement process, or may 
simply denote an inherent variability in the underlying process itself that needs to be separated from 
the variability along the y axis (or the vertical axis), such as variations in maturity in the context 
of growth curves. As another possibility, the warping may be introduced as a tool to horizontally 
align the observed functions, reduce their variance and increase parsimony in the resulting model. 
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phase components warping functions 




Figure 1 : Separation of phase and amplitude variability in function data. 



We will call these functions elastic functions, keeping in mind that we allow only the x-axis (the 
domain) to be warped and the y-values to change only consequentially. 

Consider the set of functions shown in the top-left panel of Fig. [TJ These functions differ from 
each other in both heights and locations of their peaks and valleys. One would like to separate the 
variability associated with the heights, called the amplitude variability, from the variability associ- 
ated with the locations, termed the phase variability. Extracting the amplitude variability implies 
temporally aligning the given functions using nonlinear time warping, with the result shown in the 
bottom right. The corresponding set of warping functions, shown in the top right, represent the 
phase variability. The phase component can also be illustrated by applying these warping func- 
tions to the same function, also shown in the top right. The main reason for separating functional 
data into these components is to better preserve the structure of the observed data, since a separate 
modeling of amplitude and phase variability will be more natural, parsimonious and efficient. 

As another, more practical, example we consider the height evolution of subjects in the famous 
Berkeley growth datalll Fig. [8]shows the time derivatives of the growth curves, for female and male 
subjects, to highlight periods of faster growth. Although the growth rates associated with different 
individuals are different, it is of great interest to discover broad common patterns underlying the 
growth data, particularly after aligning functions using time warping. Thus, one would like an 
automated technique for alignment of functions. Section |5] shows examples of data sets from the 
other applications studied in this paper, including handwriting curves, gene expression signals, and 
neuroscience spike trains. 

In some applications it may be relatively easy to decide how to warp functions for a proper 
alignments. For instance, there may be some temporal landmarks that have to be aligned across 
observations. In that case the warping functions can be piecewise smooth (e.g. linear) functions 
that ensure that the landmarks are strictly aligned. This situation requires a manual specification 
of landmarks which can be a cumbersome process, especially for large datasets. In some other 
cases there may be some natural models that can be adopted for the warping functions. However, 

'http://www.psych.mcgill.ca/faculty/ramsay/datasets.html 
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in general, one does not have such landmarks or natural warping functions, and needs a compre- 
hensive framework where the alignment of observed functions is performed automatically in an 
unsupervised fashion. We seek a principled framework that will automatically estimate domain 
warpings of the observed functions in order to optimally align them. The two main goals of this 
paper are: 

1. Joint Alignment and Comparison (Section 3): There are two distinct steps in the analysis 
of elastic functions: (1) warpings or registration of functions and (2) their comparison. An 
important requirement in our framework is that these two processes, warping and compar- 
ison, are performed in a single, unified framework, i.e. under a single objective function, 
as for example was done in ifTTl . A fundamental idea is to avoid treating warping as a pre- 
processing step where the individual functions are warped according to an objective function 
that is different from the metric used to compare them. 

2. Signal Estimation Under Random Scales, Translations, and Warpings (Section 4): An 
application of this framework is in estimation of a signal under the following observation 
model. Let fi be an observation of a function g under random scaling, random vertical 
translation, and random warping, and we seek an estimator for g using {/j, z = 1, 2, . . . , n}. 
We will use this estimator for performing the alignment mentioned in the previous item. 

Before we introduce our framework that achieves these goals, we present a brief summary of some 
past methods, and their strengths and limitations. 



1.1 Past Techniques 

There exists a large literature on statistical analysis of functions, in part due to the pioneering 
efforts of Ramsay and Silverman [[T6l . Kneip and Gasser ifTOl . and several others [[T4ll22] . When 
restricting to the analysis of elastic functions, the literature is relatively recent and limited [[T5l |5] 
[T4l|22l[TTl. There are basically two categories of the past papers on this subject. One set treats the 
problem of functional alignment or registration as a pre-processing step. Once the functions are 
aligned, they are analyzed using the standard tools from functional analysis, e.g the cross-sectional 
mean and covariance computation and PCA. The second set of papers study both comparison and 
analysis jointly, using energy-minimization procedures. Although the latter generally provides 
better results due to a joint solution, the choice of the energy function deserves careful scrutiny. 
As an example for the first set, in [14] , the authors use warping functions that are convex com- 

/ {'\f'^''\s)\vdsY'^ 

binations of functions of the type: 7i(t) = ^fr-ju) , where u and p are two parameters, 

with the recommended values being 1^ = and p = 1. Then, the warped functions {/j o 7^} are 
analyzed using standard statistical techniques under the Hilbert structure of square-intergable func- 
tions. Similarly, James []6l uses moment-based matching for aligning functions, followed up by the 
standard FPCA. The main problem with this approach is that the objective function for alignment 
is unrelated to the metric for comparing aligned functions. The two steps are conceptually disjoint 
and a change in the objective function for alignment may change the subsequent results. 

We introduce some additional notation. Let T be the set of orientation-preserving diffeomor- 
phisms of the unit interval [0,1]: T = {7 : [0,1] [0, 1]|7(0) = 0, 7(1) = 1, 7 isadiffeo}. 
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Elements of T form a group, i.e. (1) for any 71, 72 G T, their composition 71 o 72 e F; and (2) for 
any 7 G F, its inverse 7^^ G F, where the identity is the self-mapping 7id(t) = t. The role of F 
in elastic function analysis is paramount. Why? For a function f E J', where is an appropriate 
space of functions on [0, 1] (defined later), the composition / o 7 denotes the re-parameterization 
or a domain warping of / using 7. Therefore, F is also referred to as the re-parameterization or 
the warping group. In this paper we will use ||/|| to denote (/q^ |/(t)p(it)^/^, i.e., the standard 
norm on the space of real- valued functions on [0, 1]. A majority of past methods study the problem 
of registration and comparisons of functions, either separately or jointly, by solving: 

inf||/i-(/2 07)11 (1) 

The use of this quantity is problematic because it is not symmetric. The optimal alignment of /i 
to /2 gives a different minimum, in general, when compared to the optimal alignment of /2 to /i. 
One can enforce a symmetry in Eqn. [1] using a double optimization, i.e. by seeking a solution to 
the problem inf ,y2)Grxr || (/i ° 7i) — (/2 ° I2) \\ ■ However, this is a degenerate problem. Another 
way of ensuring symmetry is to solve: inf^gr || /i — (/2 o 7) || + inf -ygr || /2 — (/i o 7) || • While this 
is symmetric, it still does not lead to a proper distance on the space of functions. 

The basic quantity in Eqn. [T]is commonly used to form objective functions of the type: 

Ea,.[z/]= inf (||(/,o7,)-z/f + A 7^(7^)) , 2 = 1, 2, . . . , n , (2) 

where 7?. is a smoothness penalty on the 7jS to keep them close to 7i(i(t) = t. The optimal 7* are 
then used to align the /^s, followed by a cross-sectional analysis of the aligned functions. This 
procedure, once again, suffers from the problem of separation between the registration and the 
comparison steps. Another issue here is: What should u be? It seems natural to use the cross- 
sectional mean of /jS but that choice is problematic both empirically and conceptually (more on 
that later). Tang and Miiller [|22]| use u = fj, obtain a set of pairwise warping functions 7jj for each 
i, and average them to form the warping function for /j. Kneip and Ramsay IfTTH take a template- 
based approach and use a different v for each i, given by z/j = I]^=i c^ij- Here, the ^^s are certain 
basis elements that are also estimated from the data and, in turn, relate to the principal components 
of the observations. Although this formulation has the nice property of solving for the registration 
and the principal components simultaneously, it implicitly uses the quantity in Eqn. [T]to compute 
the residuals. 

1.2 Proposed Approach 

We are going to take a differential geometric approach that provides a natural and fundamental 
framework for alignment of elastic functions. This approach is motivated by recent developments 
in shape analysis of parametrized curves [|27l 1211 . The use of elastic functions for analysis of 
variance and clustering has also been studied in [|9l and for analysis of spike train data in f26|. 

It is problematic to use the cross-sectional mean of {fi} in Eqn. [2]for finding optimal align- 
ments. To understand this issue, consider the following estimation problem. Let /j = Q(5'0 7j) + ej, 
i = 1,2, ... ,n, represent observations of a signal g E J-' under random warpings 7^ G F, scalings 
Ci E IR+ and vertical translations G M, and we seek an estimator for g given {fi}. Note that esti- 
mation of g is equivalent to the alignment of /jS since, given g, one can estimate 7jS and compute 
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fi o to align them. So, we focus on deriving an estimator for g. In this context, it is easy to 
see that the cross-sectional mean for {fi} is not an estimator of g. In fact, we claim that to derive 
an estimator for g it is more natural to work in the quotient space J-'/T rather than itself. This 
quotient space is the set of orbits of the types [/] = {(/07) I7 G T}. We will show that the Karcher 
mean of the orbits { [fi] } is a consistent estimator of the orbit of g and that a specific element of 
that mean orbit, selected using a pre-determined criterion, is a consistent estimator of g. 

Now, the definition of Karcher mean requires a proper distance on J^/F. The quantity in Eqn. 
[H cannot be used since ||/i — /2II 7^ ||(/i 07) — if 2 ° for general /i, f^ ^ T and 7 G F. 
(This point was also noted by Vantini ll23l although the solution proposed ifTSl . restricting to only 
the linear warpings, is not for general use.) Instead, we use dp^, the distance resulting from the 
Fisher-Rao Riemannian metric, since the action of F is by isometrics under that metric. That 
is, dpnifi, /2) = dpji{fi o 7, /a o 7), for all /i, /2, and 7. Fisher-Rao Riemannian metric was 
introduced in 1945 by C. R. Rao [[T71 where he used the Fisher information matrix to compare 
different probability distributions. This metric was studied rigorously in the 70s and 80s by Amari 
[[ll, Efron [|4l, Kass |l8||, Cencov lO, and others. While those earlier efforts were focused on 
analyzing parametric families, we use the nonparametric version of the Fisher-Rao Riemannian 
metric in this paper. (This nonparametric form has found an important use in shape analysis of 
curves [El].) An important attribute of this metric is that it is preserved under warping, and Cencov 
jSl showed that it is the only metric with this attribute. It is difficult to compute the distance dpR 
directly under this metric but Bhattacharya [^2] introduced a square-root representation that greatly 
simplifies this calculation. We will modify this square-root representations for use with more 
general functions. 



2 Function Representation and Metric 

In order to develop a natural and efficient framework for aligning elastic functions, we introduce a 
square-root representation of functions. 

2.1 Representation Space of Functions 

Let / be a real- valued function on the interval [0, 1]. We are going to restrict to those / that are 
absolutely continuous on [0, 1]; let F denote the set of all such functions. We define a mapping: 

'VII 11^ jsjQte tj^at Q is a continuous map. For 
otherwise 

the purpose of studying the function /, we will represent it using a square-root velocity function 
(SRVF) defined as q : [0, 1] M, where q{t) = = f{t)/^\f{t)\. This representation 

includes those functions whose parameterization can become singular in the analysis. It can be 
shown that if the function / is absolutely continuous, then the resulting SRVF is square integrable. 
Thus, we will define L^([0, 1],M) (or simply L^) to be the set of all SRVFs. For every g G 
there exists a function / (unique up to a constant, or a vertical translation) such that the given q 
is the SRVF of that /. In fact, this function can be obtained precisely using the equation: f{t) = 
/(O) + /q q{s)\q{s)\ds. Thus, the representation / (/(O), q) is invertible. 

If we warp a function / by 7, how does its SRVF change? The SRVF of / o 7 is given by: 
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q{t) = = il ° l){'t)^Ji{t)■ We will denote this transfomiation by (g, 7) = (g o 

V I dt 

The motivations for using SRVF for functional analysis are many and to understand these merits 
we first present the relevant metric. 



2.2 Elastic Riemannian Metric 

In this paper we will use the Fisher-Rao Riemannian metric for analyzing functions. We remind 
the reader that a Riemmanian metric is a smoothly-varying inner product defined on the tangent 
spaces of the manifold. 

Definition 1 For any ] ^ T and v^^ V2 G Tf{J^), where Tf{T) is the tangent space to T at f, the 
Fisher-Rao Riemannian metric is defined as the inner product: 

{{vi,V2))f = \ f V^{t)v2{t)J—dt . (3) 

^ 4 Jo \f{t)\ 

In case we are dealing only with functions such that f{t) > 0, e.g. cumulative distribution func- 
tions or growth curves, then we obtain a more classical version of the Fisher-Rao metric. Thus, the 
above definition is a more general form of the Fisher-Rao metric, the one that deals with signed 
functions instead of just density functions. 

This metric has many fundamental advantages, including the fact that it is the only Riemannian 
metric that is invariant to the domain warping and has played an important role in information 
geometry. This metric is somewhat complicated since it changes from point to point on J^, and it is 
not straightforward to derive equations for computing geodesies in J^. For instance, the geodesic 
distance between any two points /i, /2 G J-" is based on finding a geodesic path between them 
under the F-R metric. This minimization is non-trivial and only some numerical algorithms are 
known to attempt this problem. Once we find a geodesic path connecting fi and /2 in J^, its 
length becomes the geodesic distance dpR- However, a small transformation provide an enormous 
simplification of this task. This motivates the use of SRVFs for representing and aligning elastic 
functions. 

Lemma 1 Under the SRVF representation, the Fisher-Rao Riemannian metric becomes the stan- 
dard metric. 

Proof is given in the appendix. This result can be used to compute the distance dpB, between any 
two functions as follows. Simply compute the distance between the corresponding SRVFs and 
set dpR to that value: dpnifi, /2) = — Q2\\- The next question is: What is the effect of warping 
on dpj^l This is answered by the following result. 

Lemma 2 For any two SRVFs qi, q2 E and'j E T, ||(gi, 7) — {q2, 7)|| = \\qi — q2\\- 

See the appendix for the proof. In the case of functions with the non-negativity constraint (that is, 
/ > 0), this transformation was used by Bhattacharya [0. 
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Table 1 . Bijective Relationship Between Function Space T and SRVF space \2 



Item 


Funciion opace j 


oKVF opace iLi 


Kepreseniaiion 


J 


[QiJ[^)) 


Transfomiation 


f{t) = f{0) + Ioq{sMs)\ds 


qit) = fit)N\fit)\ 


Metric 


Fisher-Rao Metric 


Metric 




{{V1,V2))^ = Jo Vi{t)v2{t)jj^^dt 


(Wl,W2) = Jq Wi(t)w2{t)dt 


Distance 


dpRifl, /2) 


Iki -?2|| 


Isometry 


dpRifl, f2) = dpRifi o 7, /2 o 7) 


Iki - 92II = ||(gi,7) - (?2,7) 1 


Geodesic 


Numerical Solution 


Straight Line 


Elastic Distance 
between /i and /2 


d = inf^gr c^faI/i, /2 7) 


d = inf^gr - 07)^7) II) in 5 
Solved Using Dynamic Programming 



2.3 Elastic Distance on Quotient Space 

So far we have defined the Fisher-Rao distance on and have found a simple way to compute it 
using SRVFs. But we have not involved any warping function in the distance calculation and thus 
it represents a non-elastic comparison of functions. The next step is to define an elastic distance 
between functions as follows. The orbit of an SRVF g G is given by: [q] = closure{(g, 7)17 G 
r} = closure{(g o 7)v^)|7 G T}. It is the set of SRVFs associated with all the warpings of a 
function, and their limit points. Any two elements of [q] represent functions which have the same 
y variability but different x variability. Let S denote the set of all such orbits. To compare any two 
orbits we need a metric on S. We will use the Fisher-Rao distance to induce a distance between 
orbits, and we can do that only because under this the action of T is by isometrics. 

Definition 2 For any two functions fi, f2 ^ and the corresponding SRVFs, gi,g2 ^ L^, we 
define the elastic distance d on the quotient space S to be: d{\qi\, [^2]) = inf^gr H^i — (q'2, 7) ||- 

Note that the distance d between a function and its domain-warped version is zero. However, it can 
be shown that if two SRVFs belong to different orbits, then the distance between them is non-zero. 
Thus, this distance d is a proper distance (i.e. it satisfies non-negativity, symmetry, and the triangle 
inequality) on S but not on itself, where it is only a pseudo-distance. 

Table 1 provides a quick summary of relationships between the Fisher-Rao metric and F on 
one hand, and the metric and the space of SRVFs on the other. 

3 Karcher Mean and Function Alignment 

An important goal of this warping framework is to align the functions so as to improve the matching 
of features (peaks and valleys) across functions. A natural idea is to compute a cross-sectional 
mean of the given functions and then align the given functions to this mean template. The problem 
is that we do not have a proper distance function on L^, invariant to time warpings, that can be 
used to define a mean. But we have a distance function on the quotient space S, so we will use a 
mean on that space to derive a template for function alignment. We will do so in two steps: First, 
for a given collection of functions /i, /2, . . . , /„, and their SRVFs gi, g2, • • • , we compute the 
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mean of the corresponding orbits [gi], [^2], • • • , [qn] in the quotient space S; we will call it 
Next, we compute an appropriate element of this mean orbit to define a template /i^ in L^. Then, 
the alignment of individual functions comes from warping their SRVFs to match the template /i„ 
under the elastic distance. 

We remind the reader that if dist denotes the geodesic distance between points on a Riemannian 
manifold M, and {pi,i = 1, 2, . . . , n} is a collection of points on M, then a local minimizer of 
the cost function p i-)- J2^^i dist{p,piY is defined as the Karcher mean of those points [|71. It is 
also known by other names such as the intrinsic mean or the Frechet mean. The algorithm for 
computing a Karcher mean is based on gradients and has become a standard procedure in statistics 
on nonlinear manifolds (see, for example [12J). We will not present the general procedure but will 
describe its use in our problem. 

3.1 Karcher Mean of Points in T 

In this section we will define a Karcher mean of a set of warping functions {7t}, under the Fisher- 
Rao metric, using the differential geometry of V. Analysis on V is not straightforward because 
it is a nonlinear manifold. To understand its geometry, we will represent an element 7 G F by 
the square-root of its derivative ip = Note that this is the same as the SRVF defined earlier 
for elements of J-", except that 7 > here. The identity element 7^^ maps to a constant function 
with value z^jd(t) = 1. Since 7(0) = 0, the mapping from 7 to ^ is a bijection and one can 
reconstruct 7 from ^ using 7(t) = /q tp(syds. An important advantage of this transformation is 
that since WipW"^ = Jq ip{t)^dt = 'y{t)dt = 7(1) - 7(0) = 1, the set of all such ^s is §00, the 
unit sphere in the Hilbert space L^. In other words, the square-root representation simplifies the 
complicated geometry of F to the unit sphere. Recall that the distance between any two points 
on the unit sphere, under the Euclidean metric, is simply the length of the shortest arc of a great 
circle connecting them on the sphere. Using Lemma [U the Fisher-Rao distance between any two 
warping functions is found to be dpRiji, 72) = cos~^(/q^ ^Jji{t)yJ^2{t)dt). Now that we have a 
proper distance on F, we can define a Karcher mean as follows. 

Definition 3 For a given set of warping functions 71, 72, . . . , 7^ € F, define their Karcher mean 

to be 7„ = argmin^gr ELi dpRil, lif- 

The search for this minimum is performed using Algorithm 1 as follows: 
Algorithm 1: Karcher Mean of {74} Under dpR. 

Let ifji = v^Ti be the SRVFs for the given warping functions. Initialize /i^ to be one of the ^jS or 
use where w = ^ SILi i'i- 

1. For i = 1, 2, . . . , n, compute the shooting vector v-i = -^^^{i^i — cos(6'j)/i^), where 9i = 
cos-'^ {Jq IJ.^{t)^i{t)dt). 

2. Compute the average v = ^ SILi '^i- 

3. If \\v\\ is small, then stop. Else, update i-)- cos(e||w||)/x^ + sin(e||i;||)-p|| , for a small step 
size e > and return to Step 1. 

4. Compute the mean warping function using 7„ = /q fi^{s)'^ds. 
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3.2 Karcher Mean of Points in <S = LVr 



Next we consider the problem of finding means of points in the quotient space S. Since we already 
have a well-defined distance on S (given in Definition [2l), the definition of the Karcher mean 
follows. 

Definition 4 Define the Karcher mean of the given SRVF orbits {[qi]} in the space S as a 
local minimum of the sum of squares of elastic distances: 

n 

[n]n = argmin^ [g^])^ . (4) 

We emphasize that the Karcher mean is actually an orbit of functions, rather than a function. 
That is, if fiQ is a minimizer of the cost function in Eqn. HI then so is (/iq, 7) for any 7. The full 
algorithm for computing the Karcher mean in S is given next. 

Algorithm 2: Karcher Mean of {[gj]} in S 

1. Initialization Step: Select = qj, where j is any index in argmin^<j<„ I ki ~ ^ ^k=i Qk\\- 

2. For each q^ find 7* by solving: 7* = argmin^gp — (g^ o 7)-\/7||- The solution to this 
optimization comes from a dynamic programming algorithm. In cases where a solution does 
not exist in F, the dynamic programming algorithm still provides an approximation in F. 

3. Compute the aligned SRVFs using g^ H- (g^ o 'y*)\fy*. 

4. If the increment ||^ J2i=i Q^j ~ /^ll is small, then stop. Else, update the mean using fi 
^ Sr=i Qi return to step 2. 

The iterative update in Steps 2-4 is based on the gradient of the cost function given in Eqn. |4l 
Although we prove its convergence next, its convergence to a global minimum is not guaranteed. 
Denote the estimated mean in the A;th iteration by fi^^\ In the A;th iteration, let 7!*^'' denote the 
optimal domain warping from gj to fi^'^^ and let g[*''' = {qiO'^f^^)\f^ ■ Then, I]"=i d{\ijS^^], [qi]Y = 

ELi Wl^^^'^ - > Er=i 11/"^^+'^ - #¥ > T.l=id{[^i^^+% [qi]f. Thus, the cost function 

decreases iteratively and as zero is a natural lower bound, I]"=i d{ [/x*^^)] , [gj] ) ^ will always converge. 

3.3 Center of an Orbit 

The remaining task is to find a particular element of this mean orbit so that it can be used as a 
template to align the given functions. Towards this purpose, we will define the center of an orbit 
using a condition similar to past papers, see e.g. [|22l . which says that the mean of the warping 
functions should be the identity. A major difference here is that we use the Karcher mean and not 
the cross-sectional mean as was done in the past. 

Definition 5 For a given set of SRVFs gi, g2, . . . , gn and q, define an element q of [q] as the center 
of[q] with respect to the set {qi} if the warping functions {■Ji}, where 7^ = argmin^gp \\q — {qi, 7) ||) 
have the Karcher mean '-fid. 
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Figure 2: Finding center of the orbit [q] with respect to the set {q-i}. 
We will prove the existence of such an element by construction. 

Algorithm 3: Finding Center of an Orbit : WLOG, let q be any element of the orbit [q]. 

1. For each find 7^ by solving: 7^ = argmin^gp l)Vl\\)- 

2. Compute the mean 7„ of all {7j} using Algorithm 1. The center of [q] wrt {qi] is given by 

This algorithm is depicted pictorially in Fig. |2]We need to show that q resulting from Algorithm 
3 satisfies the mean condition in Definition [51 Note that 7^ is chosen to minimize ||g — (gj,7)||, 
and also that ||g - (9^,7) II = W^ln^) - (?i,7)ll = Ik - {(1^1° 7n)||- Therefore, 7* = % o 7-1 
minimizes ||g — (g.j,7)||. That is, 7* is a warping that aligns qi to q. To verify the Karcher mean 
of 7*, we compute the sum of squared distances Yh=i dpRil, llY = Yh=i dpRil-, 7i ° In^Y = 
Ya=i dpRil o 7n, 7j)^- As 7„ is already the mean of 7,, this sum of squares is minimized when 
7 = 7irf. That is, the mean of 7* is 7^^. 

We will apply this setup in our problem by finding the center of with respect to the given 
SRVFs {g,}. 

3.4 Complete Alignment Algorithm 

Now we can utilize the three algorithms. Algorithm 1-3, to present the full procedure for finding a 
template /i„ that is used to align the individual functions. 

Complete Alignment Algorithm: Given a set of functions /i, /2, . . . /n on [0, 1], let gi, ^2, • • • , Q'n 
denote their SRVFs, respectively. 

1 . Computer the Karcher mean of [qi] , [^2] , • • • , [qn] in <S using Algorithm 2. Denote it by 

2. Find the center of wrt {q^} using Algorithm 3; call it (Note that this algorithm 
requires a step for computing the Karcher mean of warping functions using Algorithm 1). 

3. For 2 = 1, 2, ... , n, find 7* by solving: 7* = argmin^gp ||/i„ - (g^, 7) ||. 

4. Compute the aligned SRVFs qi = {qi, 7*) and aligned functions /« = o 7*. 

5. Return the template the warping functions {7*}, and the aligned functions {/«}. 



10 



{fi} {fi} {7,*} mean ± std, before mean ± std, after 




Figure 3: Results on simulated data set 1. 



3.5 Simulation Results 

To illustrate this method we use a number of simulated datasets. Although our framework is 
developed for functions on [0, 1], it can easily be adapted to an arbitrary interval using a linear 
transformation. 

1. Simulated Data 1: As the first example, we study a set of simulated functions used previ- 
ously in [|TT1. The individual functions are given by: yi{t) = Zj,ie^^*~^'^^ /^ + Zj_2e~''*^^'^^ 

i = 1, 2, . . . , 21, where Zi^i and 2 are i.i.d normal with mean one and standard deviation 
0.25. Each of these functions is then warped according to: 7j(t) = Q{^-^-^^y^zy-^) — 3 if 
7^ 0, otherwise 7^ = %d, where Oj are equally spaced between —1 and 1, and the observed 
functions are computed using fi{t) = yii'jiit)). A set of 21 such functions forms the origi- 
nal data and is shown in the left panel of Fig. |3l and the remaining panels show the results 
of our method. The second panel presents the resulting aligned functions {fi} and the third 
panel plots the corresponding warping functions {7*}. The remaining panels show the cross- 
sectional mean and mean ± standard deviations of {fi} and {fi}, respectively. The plot of 
{fi} shows a tighter alignment of functions with sharper peaks and valleys. The two peaks 
are at —1.5 and 1.5 which is exactly what we expect. This means that the effects of warping 
generated by the 7jS have been completely removed and only the randomness from the y^s 
remains. Also, the plot of mean ± standard deviation shows a thinning of bands around the 
mean due to the alignment. 

2. Simulated Data 2: As a simple test of our method we analyze a set of functions with no 
underlying phase variability. To do this, we take {?/«}, as above, but this time we do not 
warp them at all; these functions are shown in the left panel of Figure IH Note that, by 
construction, the two peaks in these functions are always aligned, only their amplitudes are 
different. There is a slight misalignment in the valleys between the two peaks due to differing 
mixture weights. The result of the alignment process is shown in the remaining panels. The 
second panel shows that the aligned functions are very similar to the original data, except 
for a better alignment of the valleys. The next panel shows the estimated warping functions 
which are very close to the identity. The last panel shows the means of the original and the 
aligned functions and they are practically identical. 

3. Simulated Data 3: In this case we take a family of Gaussian kernel functions with the 
same shape but with significant phase variability, in the form of horizontal shifts, and minor 
amplitude variation. Figure |5] shows the original 29 functions {/»}, the aligned functions 
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{fi} {fi} {7*} mean, before and after 




Figure 4: Results on simulated data 2. 



{fi} {fi} {7*} mean ± std, before mean ± std, after 




Figure 5: Results on simulated data 3. 



{fi}, the warping functions {7*}, and the before-and-after cross sectional mean and standard 
deviations. Once again we notice a tighter alignment of functions with only minor variability 
left in {/i} reflecting the differing heights in the original data. The remaining two plots show 
that mean ± standard deviation of the aligned data is far more compact than the raw data. 

4. Simulated Data 4: In this case we take a family of multimodal wave functions with the 
same shape but different phase variations. The individual functions are defined on [0, 9] 
and given by: fi{t) = (1 — (7j(t)/9 — 0.5)^) sin(7r7j(t)), i = 1, 2, . . . , 9, with the warping 
functions 7i(t) = 9(^^^^^--p-) if 7^ 0, otherwise % = 'jid- Here are equally spaced 
between —1.5 and 1.5 with step size 0.375. Figure |6] shows the original 9 functions {fi}, the 
aligned functions {/,} (clearly showing the common shape), the warping functions {7^*}, 
and the before-and-after cross sectional mean and standard deviations, again showing the 
huge difference in apparent amplitude variation between aligned and unaligned functions. In 
particular, with only the phase variability in the data, our method has a perfect alignment of 
given functions. 



{fi} {fi} {7*} mean ± std, before mean ± std, after 




Figure 6: Results on simulated data 4. 
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4 Signal Estimation and Estimator Consistency 



In this section we justify the proposed framework by posing and solving a model-based estimation 
of alignment. Consider an observation model fi = Ci{g o '-f^) -\- a, i = 1, . . . ,ri, where g is 
an unknown signal, and q G M+, 7^ G F and Cj G J-" are random. We will concentrate on a 
simpler problem where the observation noise is set to a constant and, given the observations 
{fi}, the goal is to estimate the signal g or, equivalently, the warping functions {7i}. This or 
related problems have been considered previously by several papers, including ll25l [TSl . but we 
are not aware of any formal statistical solution. Here we show that the center resulting from 
the complete alignment algorithm, leads to a consistent estimator of g. The proofs of Lemmas and 
Corollary are given in the appendix. 

Theorem 1 For a function g, consider a sequence of functions fiif) = Cigipfiit)) + Cj, where Ci 
is a positive constant, is a constant, and 7^ is a time warping, i = 1, ■ ■ ■ ,n. Denote by qg 
and qi the SRVFs of g and fi, respectively, and let s = ^ J2i=i \f^i- Then, the Karcher mean of 
{[gi], i = 1, 2, . . . , n} in S is s[qg\. That is, 

[^]n = argmin ^ d^{\qi], [q]) = s[qg] = s{{qg, 7), 7 G T} . 
M Vi=i / 

We will prove this theorem in two steps. First we establish the following useful result. 

Lemma 3 For any qi,q2 G and a constant c > 0, we have argmin^gj. — (g2,7)|| = 
argmin^gr l|cgi - ('?2, 7) II- 

Corollary 1 For any function g G and constant c> 0, we have 7^^ G argmin^gp ||cg — (g, 7)||. 
Moreover, if the set {t G [0, l]|q'(t) = 0} has (Lebesgue) measure 0, •jid = argmin^gp ||cg — (g, 7)||. 

Now we get back to the proof of Theorem [T] The SRVF of the function /i = Cj((7 o 7^) + a is given 
by qi = ^/cli^qg, 7i), 2 = 1, ■ ■ ■ , n. For any q, we get 

d{[qi]A(l]) = c?([v/cl(gg,7,)], [g]) = inf ||v/Q(gg,7i) - (?,7)ll 

= inf liyclgg- (g,7 7»"^)|| = inf II v^cl^s - (9,7)11 = d{[\/ciqg],[q]). 

In the last line, the first equality is based on the isometry of the group action of F on and the 
second equality is based on the group structure of F. 

For any given g, let 7* G argmin^gp ||gg— (g, 7)||- Then, using LemmaS 7* G argmin^gp Wy/ciqg— 
(g, 7)||. Therefore, 

i=l i=l i=l 1=1 

The last inequality comes from the fact that sqg is simply the mean of {^/clqg} in space. The 
equality holds if and only if (g, 7*) = sqg or g = (sg^, 7*"^). 
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Actually, for any element of [sqg], say (sg^, 7) for any 7 e F, we have 

n n n 

i=l i=l 1=1 

Therefore, {{sqg, 7) I7 G F} is the unique solution to the Karcher mean = argminj^j Y17=i '^^ ( ilil All)- ^ 

Next, we present a simple fact about the Karcher mean of warping functions, where the Karcher 
mean is given in Definition [3l 

Lemma 4 Given a set {•ji G T\i = 1, n} and a 70 G F, if the Karcher mean 0/(74} is 7, then 
the Karcher mean of{ji o 70} /5 7 o 79. 

Theorem [H ensures that belongs to the orbit of [qg] (up to a scale factor) but we are inter- 
ested in estimating g itself, rather than its orbit. Since we can write g o sls {g o 7^) o (7"^ o j^), 
for any 7a G F, the function g is not identifiable unless we impose an additional constraint on 7^. 
The same goes for the random variables q and e^. Under the assumption that the population means 
of 7j"\ Cj, and Cj are known, we will show in two steps that Algorithm 3, that finds the center of 
the orbit [//]„, results in a consistent estimator for g. 

Theorem 2 Under the same conditions as in TheoremU\ let n = {sqg, 70), for 70 G F, denote an 
arbitrary element of the Karcher mean class = s['ig\- Assume that the set {t G [0, 1] \g{t) = 0} 
has Lebesgue measure zero. If the population Karcher mean o/{7j~^} is -^id, then the center of the 
orbit [ij\n, denoted by fin, satisfies lim„_5.oo /^n = E{s)qg. 

Proof: In Algorithm 3, we first compute % = argmin^ || (gj, 7) — = argmin^ || {y/cl{qg, 7i), 7) — 
(sgg,7o)|| = argmin^ ||(^gg,7i0707o'i)-sgg||. Since the set {t G [0,l]\g{t) = 0} has measure 
zero, the set {t G [0, l]\qg{t) = 0} also has measure zero. Using Corollary 1, this above distance 
is uniquely minimized when 7^07^0 7^ ^ = 7^^, or 7, = 7^"^ o 7q. Denote the Karcher mean of 
these warping functions {7j} by 7„. Applying the inverse of this 7„ to fi, we get fin = {fi, 7^^^)- 
As n — ^ 00, the Karcher mean of 7j converges to its population mean which, by Lemma |4l is 79. 
Thus, fin E(s)((gg,7o),7o"^) = E{s)qg. □ 

This result shows that asymptotically one can recover the SRVF of the original signal using 
the Karcher mean of the SRVFs of the observed signals. Of course, one is really interested in the 
signal g itself, rather than its SRVF. One can reconstruct g using aligned functions {/j} generated 
by the Alignment Algorithm in Section |3] As discussed above, we further assume the population 
mean of is known. 

Theorems Under the same conditions as in Theorem^ let 7* = argmin^ IKo'iiT) ~ A^nll (ind 
ft = fi°l*i- If we denote c=\ Er=i Ci ande= ^ J27=i ^i' ^^^^ lim„^oo ^ YJl=i fi = E{c)g+E{e). 

Proof: In the proof for Theorem IH 7^ = argmin^ || (g^, 7) - = argmin^ II (^i, 7) - (/^n., 7n) || • 
Hence 7* = argmin^ Wi.l) - l^n\\ = li°ln^ = li^ °lQ°ln^- This implies that fi = fiO 7* = 
{ci{g o 7i) + Ci) ° (7j~^ ° 7o o 7,^^) = Ci{g o 70 o 7-1) + a. As 7„, -> 70 when n 00, we have 

\im ^ E /. = E{c){g o 70) o 7-1 + E{e) = E{c)g + E{e). 
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g {fi} {fi} estimate of (yf error w.r.t. n 




0.5 1 0.5 1 0.5 1 0.5 1 5 10 20 30 40 50 



Figure 7: Example of consistent estimation. 



Illustration. We illustrate the estimation process using an example with 5 (t) = sin(57rt),t G [0, 1]. 
We randomly generate n = 50 warping functions {7,} such that {7^^^} are i.i.d with mean 74^. We 
also generate i.i.d sequences {q} and {ej} from the exponential distribution with mean 1 and the 
standard normal distribution, respectively. Then we compute functions fi = Ci{gori) + Cj to form 
the functional data. In Fig. Ul the first panel shows the function g, and the second panel shows the 
data {fi}. The Alignment Algorithm in Section|3]results in the aligned functions {fi = 07*} that 
are are shown in the third panel in Fig. |7] Using Theorem |3] the original signal g can be estimated 
by (^ELi fi - E{e))/E{c). In this case, E{c)) = l,E{e) = 0. This estimated g (red) as well 
as the true g (blue) are shown in the fourth panel. Note that the estimate is reasonably successful 
despite large variability in the raw data. Finally, we examine the performance of the estimator with 
respect to the sample size, by performing this estimation for n equal to 5, 10, 20, 30, and 40. The 
estimation errors, computed using the norm between estimated g's and the true g, are shown in 
the last panel. As expected from the earlier theoretical development, this estimate converges to the 
true g when the sample size n grows large. 

5 Experimental Evaluation of Function Alignment 

In this section we take functional data from several application domains and analyze them using 
the framework developed in this paper. Specifically, we will focus on function alignment and 
comparison of alignment performance with some previous methods on several datasets. 

5.1 Applications on real data 

We start with demonstrations of the proposed framework on some well known functional data. 

1. Berkeley Growth Data: As a first example, we consider the Berkeley growth dataset for 54 
female and 39 male subjects. For better illustrations, we have used the first derivatives of the 
growth curves as the functions {fi} in our analysis. (In this case, since SRVF is based on the first 
derivative of /, we actually end up using the second derivatives of the growth functions.) 

The results from our elastic function analysis on the female growth curves are shown in Fig. |8] 
(left side). The top-left panel shows the original data. It can be seen from this plot that while the 
growth spurts for different individuals occurs at slightly different times, there are some underlying 
pattems to be discovered. This can also be observed in the cross-sectional mean and mean ± stan- 
dard deviation plot in the bottom-left panel. In the second panel of the top row we show the aligned 
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Male Data 
original data aligned functions 
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Figure 8: Analysis of growth data. Top: original data and the aligned functions. Bottom: the 
corresponding cross-sectional mean and mean ± standard deviations. 



functions fi(t). The panel below it, which shows the cross-sectional mean and mean ± standard 
deviation, exhibits a much tighter alignment of the functions and, in tum, an enhancement of peaks 
and valleys in the aligned mean. In fact, this mean function suggests the presence of two growth 
spurts, one between 3 and 4 years, and the other between 10 and 12 years, on average. Similar 
analysis is performed on the male growth curves and Fig. |8](right) shows the results: the original 
data (consisting of 39 derivatives of the original growth functions), the aligned functions fi{t), and 
the corresponding cross-sectional means and means ± standard deviations. The cross-sectional 
mean functions also show a much tighter alignment of the functions and, in tum, an enhancement 
of peaks and valleys in the aligned mean. This mean function suggests the presence of several 
growth spurts, between: 3 and 5, 6 and 7, and 13 and 14 years, on average. 

2. Handwriting Signature Data: As another example of the data that can be effectively modeled 
using elastic functions, we consider some handwritten signatures and the acceleration functions 
along the signature curves. This application was also considered in the paper [11]. Let {x{t), y{t)) 
denote the x and y coordinates of a signature traced as a function of time t. We study the accel- 
eration functions f(t) = \Jx{ty + y{ty for different instances of the signatures and study their 
variability after alignment. 

The left panel in Fig. [9]shows the 20 acceleration functions of 20 signatures that are used in our 
analysis as {/«}. The corresponding cross-sectional mean and mean ± standard deviations before 
the alignment are shown in the next panel. The right two panels show the aligned functions /jS 
and the corresponding mean and mean ± standard deviations after the alignment. A look at the 
cross-sectional mean functions suggests that the aligned functions have much more exaggerated 
peaks and valleys, resulting from the alignment of these features due to warping. 

3. Neuroscience Spike Data: Time-dependent information is represented via sequences of stereo- 
typed spike waveforms in the nervous system. These waveform sequences (or spike trains) have 
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Figure 9: Analysis of signature profiles. 
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Figure 10: Analysis of spike train data. 



been commonly looked as the language of the brain and are the focus of much investigation. Before 
we apply our framework, we need to convert the spike information into functional data. Assume 
is a spike train with spike times < ti < ^2 < ■ ■ ■ < ^a/ < 7", where [0, 1] denotes the record- 
ing time domain. That is, s(t) = Yl!iL\ Sit — ti), t E [0, 1] , where 5(-) is the Dirac delta function. 
One typically smooths the spike trains to better capture the time correlation between spikes. In this 
paper we use a Gaussian kernel K{t) = e~*^/^^'^^ ^(v^^vra), a > (cr = 1ms here). That is, the 
smoothed spike train is f{t) = {s * K){t) = V*^, -^p-it-u^/i^c^^) 



2no- 



Figure [To] left panel shows one example of such smoothed spike trains for 10 trials of one neu- 
rons in the primary motor cortex of a Macaque monkey subject that was performing a squared-path 
movement [|26l . The next panel shows the cross-sectional mean and mean ± standard deviation of 
the functions in this neuron. The third panel shows {fi} where we see that the functions are well 
aligned with more exaggerated peaks and valleys. The next panel shows the mean and mean ± 
standard deviation. Similar to the growth data and signature data, an increased amplitude variation 
and decreased standard deviation are observed in this plot. 



original data mean ± std, before aligned functions mean ± std, after 
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Figure 1 1 : Analysis of gene expression data. 
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4. Gene Expression data: In this example, we consider temporal microarray gene expression 
profiles. The time ordering for yeast cell-cycle genes was investigated in lfT3l . and we use the 
same data in this study. The expression level was measured during a period of 119 minutes for a 
total of 612 fully -recorded genes. There are 5 clusters with respect to phases in these continuous 
function. In particular, 159 of these functions were known to be related to M phase regulation 
of the yeast cell cycle. These 159 functions used here are same as those used in (131, and are 
shown in the left panel in Fig. [TTJ Although, in general, gene expression analysis has many goals 
and problems, we use focus only on the subproblem of expression alignment as functional data. 
The corresponding cross-sectional mean and mean ± standard deviations before the alignment are 
shown in the next panel. The right two panels show the aligned functions /jS and the corresponding 
mean and mean ± standard deviations after the alignment. Once again we find a strong alignment 
of functional data with improved peaks and valleys. 



5.2 Comparisons with other Methods 

In this section we compare the results from our method to some of the past ideas where the soft- 
ware is available publicly. While we have compared our framework with other published work in 
conceptual terms earlier, in this section we focus on a purely empirical evaluation. In particular, 
we utilize several evaluation criteria for comparing the alignments of functional data in the several 
simulated and real datasets discussed in previous sections. The choice of an evaluation criteria is 
not obvious, as there is no single criterion that has been used consistently by past authors for mea- 
suring the quality of alignment. Thus, we use three criteria so that together they provide a more 
comprehensive evaluation. We will continue to use fi and fi, i = 1, N,to denote the original 
and the aligned functions, respectively. 

1. Least Squares: A cross-validated measure of the level of synchronization 

Is measures the total cross-sectional variance of the aligned functions, relative to the original 
value. The smaller the value of Is, the better the alignment is. 

2. Pairwise Correlation: It measures pairwise correlation between functions: 

where cc(/, g) is the pairwise Pearson's correlation between functions. The larger the value 
of pc, the better the alignment between functions in general. 

3. Sobolev Least Squares: This time we compute the least squares using the first derivative of 
the functions: 

This criterion measures the total cross-sectional variance of the derivatives of the aligned 
functions, relative to the original value, and is an alternative measure of the synchronization. 
The smaller the value of sis, the better synchronization the method achieves. 
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Method 


AUTC [14| 


PACE |22| 


SMRlH 


MBM H 


F-R 


Software 


Matlab 


Matlab 


Matlab 


R 


Matlab 


Gaussian kernel 


O.OTsec 


68sec 


7.7sec 


lOlsec 


25 sec 


Bimodal 


0.02sec 


80sec 


4.5sec 


150sec 


17 sec 


Growth-male 


0.03sec 


254sec 


14.5sec 


175sec 


22sec 


Signature 


0.02sec 


145 sec 


4.2sec 


1 17sec 


27 sec 



Table 1 : Computational cost of the five methods some datasets used in Fig. [121 



We compare our Fisher-Rao (F-R) method with the area under the curve (AUTC) method pre- 
sented in [|T4ll . the Tang-Miiller method [|22]| provided in principal analysis by conditional expec- 
tation (RACE) package, the self-modeling registration (SMR) method presented in [I5], and the 
moment-based matching (MBM) technique presented in Fig. [T2] summarizes the values of 
{ls,pc, sis) for these five methods using 3 simulated and 4 real datasets. From the results, we 
can see that the F-R method does uniformly well in functional alignment under all the evaluation 
metrics. We have found that the Is criterion is sometimes misleading in the sense that a low value 
can result even if the functions are not very well aligned. This is the case, for example, in the 
male growth data under SMR method. Here the Is = 0.45, while for our method Is = 0.64, even 
though it is easy to see that latter has performed a better alignment. On the other hand, the sis 
criterion seems to best correlate with a visual evaluation of the alignment. Sometimes all three 
criteria fail to evaluate the alignment performance properly, especially when preserving the shapes 
of the original signals are considered. This is the case in the first row of the figure where the AUTC 
method has the same values of Is, pc, and sis as our method but shapes of the individual functions 
have been significantly distorted. The wave function simulated data is the most challenging and no 
other method except ours does a good job. Another point of evaluation is the number of parameters 
used by different methods. While our method does not have any parameter to choose, the other 
methods involve choosing at least two but often more parameters which makes it challenging for 
a user to apply them in different scenarios. The computational costs associated with the different 
methods are given in Table 15^ This table is for some of the datasets used in our experiments and 
are representative of the general complexities of these methods. 



6 Discussion 

In this paper we have described a parameter-free approach for an automated alignment of given 
functions using time warping. The basic idea is to use the Fisher-Rao Riemannian metric and the 
resulting geodesic distance to define a proper distance, called elastic distance, between warping 
orbits of functions. This distance is used to compute a Karcher mean of the orbits, and a template 
is selected from the mean orbit using an additional condition that the mean of the warping functions 
is identity. Then, individual functions are aligned to the template using the elastic distance and a 
natural separation of the amplitude and phase variability of the function data is achieved. One 
interesting application of this framework is in estimating a signal observed under random time 
warpings. We propose the Karcher mean template as an estimator of the original signal and prove 
that it is a consistent estimator of the signal under some basic conditions on the random warping 
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Gene expression data 
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Figure 12: Empirical evaluation of five methods on 3 simulated datasets and 4 real datasets, with 
the alignment performance computed using three criteria {Is, pc, sis) . The best cases are shown in 
boldface. 
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functions. 

Important future directions in this work include: (1) the development of joint statistical models 
for the amplitude and phase components of the data, and (2) the use of such models for classifi- 
cation of observed functions into pre-determined classes. While the techniques for modeling the 
amplitude variability are quite common, e.g. using functional principal component analysis, the 
corresponding ideas for the phase component are relatively limited. The main reason is that F is a 
nonlinear manifold and one cannot directly apply FPCA here. We mention that some solutions to 
this problem have been presented in [[T9l l20l f24 \ albeit in different contexts. 
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A Proofs of Lemmas [11 and |2| 



Proof of Lemma [H The mapping from / to g is as follows: f(t) A f{t) — )• q(t). For any 

— Q 

V E Tf{F), the differential of this mapping is: v{t) A v{t) ^ w{t). To evaluate the expression 
for w, we need the expression for Q*. In case x > 0, we have Q{x) = ^/x and its directional 
derivative in the direction of y E M is y/(2-\/a;)- In case x < 0, we have Q{x) = —\J—x 
and its directional derivative in the direction of ?/ G M is y / {2\/—x). Combining the two, the 
directional derivative of Q is Q^^xiy) = y/(2-^/|a?[). Now, to apply this result to our situation, 
consider two tangent vectors vi,V2 E Tf{F), and define their mappings under as Wi{t) = 
Q* f(t) (^i(^)) = ^i(^) / (2-y/|/(t)|). Taking the inner-product between the resulting tangent vec- 
tors, we get: {wi{t) , W2{t)) = Wi{t)w2{t)dt = \ Jo Vi{t)v2{t)jjj^y^dt. The RHS is compared 
with Eqn. [3] to complete the proof. □ 

Proof of Lemma 111 For an arbitrary element 7 e F, and gi, q2 E L^, we have: ||(gi,7) — 

fe,7)f = loium)^) - i2m)^)rdt = j^'igMt)) - q2m))'7m = iigi - 

q2rn 

B Proofs of Lemma SI Corollary [H and Lemma HI 

Proof of Lemma m Using the definition: ||cgi — (g2,7)|P = Io{cqi{t) — {q2,l){t))'^dt = c^||gip + 
||g2p — 2c Jq qi{t){q2,'y){t)dt. Note that we have used ||(g2,7)||^ = ||'?2||^, an important fact, in 
the last equality. Thus, 

argmin||gi - (^2,7)11 = argmax / qi{t){q2,'y){t)dt = argmin||cgi - (52,7) II- ^ 

Proof of Corollary [H 7^^ E argmin^gp \\cq — (g, 7)|| follows directly from Lemma |3] since 7^^ 
minimizes ||g — (g, 7) || . Next we show that this minimizer is unique if the set \t E [0, 1] |g(t) = 0} 
has measure 0. In this case, if we define = /q q{s)'^ds, then F is a strictly increasing function 
on [0,1]. 

Using Lemma [3] again, we only need to show that 'jid is the unique minimizer for ||g — (g, 7) || . 
For any 7* E F that minimizes ||g — (g, 7)||, we have ||g — (g,7*)|| = Ik— (Q'77id)|| = 0. Therefore, 
q(t) = g(7*(t))y7*(t) (almost everywhere), and F(t) = j^qi^sYds = /o g(7*(s))^7*(s)c?s = 
lo qir^dr = F(7*(t)). As F is strictly increasing, we must have 7* = -yid- □ 

Proof of Lemma m First we observe that for any two 71,72 E F, we have c?irR(7i,72) = 
dpRili o 7,72 o 7) for any 7 G F. This comes directly from the isometry of the group action 
of F on itself (proof is similar to that of Lemma |2]). This implies that: argmin^ Y17=i dpRili ° 
7o, 7)^ = argmin^ Yli=i dpRilh 7 ° 7o ^)^- Let 7* denote the optimal 7 in the last term. Since 
argmin J27=i dpRili, = 1, this implies that 7* o 7"^ = 7 or 7* = 7 o 70. □ 
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